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Abstract 

We outline a model and algorithm to perform inference on the palaeoclimate and palaeoclimate 
volatility from pollen proxy data. We use a novel multivariate non-linear non-Gaussian state space 
model consisting of an observation equation linking climate to proxy data and an evolution equation 
driving climate change over time. The link from climate to proxy data is defined by a pre-calibrated for- 
ward model, as developed in |Salter~T ownshend and Haslett ( 2012]) and [Sweeney (2012). Climatic change 
is represented by a temporally-uncertain Normal-Inverse Gaussian Levy process, being able to capture 
large jumps in multivariate climate whilst remaining temporally consistent. The pre-calibrated nature of 
the forward model allows us to cut feedback between the observation and evolution equations and thus in- 
tegrate out the state variable entirely whilst making minimal simplifying assumptions. A key part of this 
approach is the creation of mixtures of marginal data posteriors representing the information obtained 
about climate from each individual time point. Our approach allows for an extremely efficient MCMC 
algorithm, which we demonstrate with a pollen core from Sluggan Bog, County Antrim, Northern Ireland. 



Keywords: Palaeoclimate Reconstructions, Normal-Inverse Gaussian Process, Marginal Data Posteriors, 
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1 Introduction 



In this paper we aim to answer the following two questions: 

(1) What was climate during period in location s? 

(2) What was the volatility in climate during period (ti,tj) in location si 

For illustration, we set (ti,tj) — (0,14) thousand years before present (written 'ka BPQ, and s to be the 
central part of Northern Ireland. Were tj to be restricted to a smaller value we could use recent instrumental 
measurements for which we could obtain a fairly precise answer. When tj is allowed to grow large, we need to 
use proxy data: indirect records of climate stored in biological and mineral deposits. In this paper we outline 
a method to perform inference on the palaeoclimate (sometimes loosely referred to as 'reconstruction') and 
thus answer our questions of interest using pollen data, following on from Haslett et al. (20061. However, 



x We use the common convention in setting the 'present' to be 1950AD 



our methods are general and readily apply to other proxy data series or even multiple proxies simultaneously. 



Palaeoclimate reconstruction is a major focus of the Intergovernmental Panel on Climate Change (Jansen 



et al. 20071. Public interest, however, has largely been fuelled by the 'Hockey Stick' and 'climategate' 



controversies, e.g. IMann et al.| (119981 119991); IMcShane and Wynerl (120111) where (U,tj) was (0, l)ka BP and 



s was the Northern Hemisphere. Climate changes during the past millennium are relatively small and can 
be inferred with reasonable precision from proxies (e.g. tree rings) that are resolved annually. In contrast, 
the much older Younger Dryas period (12.8ka to 11.5ka BP) shows a rapid switching from warm to cold to 
warm. During this period ice core data from Greenland show abrupt warmings of up to 16 °C within decades 
(Jansen et al. 2007 P435). This size and rate of change is not captured well by the General Circulation 
Models (GCMs) which are used to predict future climate, nor by the precise proxies used to examine the 
past millennium. Pollen proxy data offer the best hope of resolving such sizeable past climate changes in 
locations other than Greenland. 



The model we propose differs from many recent studies performing inference on palaeoclimate because (a) 
it is non-linear and non-Gaussian in the relationship between climate and proxy, (b) we infer only climatic 
variables to which the proxy is sensitive, (c) we use real data to produce climate reconstructions rather than 
simulated 'psuedo-proxies', (d) we take account of chronological uncertainty, and (e) we allow for stochastic 
variability in climate. This approach, tackling many more sources of uncertainty than previously, is inspired 
by the SUPRANET groupQ of which many ideas are shared with |Tingley et al.| ( [2012| and |Huntley| ( |2012 1. 
We expand more on the importance of these various aspects in subsequent sections. 



The pollen data we use arise from cores taken beneath a lake or bog. The chosen site will have accumulated 
many thousands of years of pollen from plants and trees that have surrounded it. After extraction, the core 
will have been sliced into narrow depths di, each treated as a near instantaneous snapshot of the vegetation at 
that depth. From each slice a palynologist will count many different varieties of pollen and record the counts 
and depths. We make use of 28 of these counts, corresponding to pollen varieties that have been shown to 



be representative of a particular aspect or boundary condition on one or more aspects of climate ( Huntley 



1993). We write this 28- vector as yi. The information about climate from the pollen is obscured by: the 



delay in the reaction of plants and trees to climate change; the ability of the palynologist to recognise the 
desecrated remains of fossil pollen under a microscope; and the degree to which different species disseminate 
differentially recognisable pollen. The former of these is considered negligible over the timescales with which 
we work. The latter two are included as part of our likelihood, being both zero-inflated and highly non-linear 
in the relationship between climate and pollen. 



The practicalities of sampling lake sediment mean that the depths di may not be taken regularly throughout 
the core. Besides, even if the depths in the core are regularly taken, the times to which they correspond 
depend on the rate of sedimentation and may be very non-linear. The usual approach to transform depth 
into time is to take between 5 and 20 radiocarbon dates from various depths (again not necessarily regu- 
larly) , and use these to build a monotonic age-depth model which supplies times ti from every layer in the 
core; a number of statistical age-depth models have been proposed (e.g. |Bronk Ramsey 



2008 Haslett and 



Parnelll 2008 Blaauw and Christen 20111. A side-effect of using one of these models is that times ti are 



now uncertain, governed by a draw from a probability distribution ir{t\d) with t and d representing all of the 
times and depths in the core. Each draw is termed a chronology. 



Taken altogether, we can write our model in state space form as: 

Vi\ci ~ fe(ci,T>), i = l...,n 
(k = Cj_i +7i, i = 2,...,n 



(1) 
(2) 



2 Studying uncertainty in palaeoclimate reconstructions: http://caitlin-buck.staff.shef.ac.uk/SUPRAnet/ 
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where yi = y(ti) represents pollen data at time U, Cj = c(U) is palaeoclimate, / is a forward model parame- 
terised by 0, and 73 ~ N(0, v(ti-i, U)) wee residuals. Following convention, we call Equation[I]the observation 
equation, and Equation [2] the evolution equation. All of the parameters involved are multivariate: j/, being 
here of dimension 28 and Cj being of dimension 3. In our example in Section [6] n = 115. The parameters 
and the relationship / are well known and thus pre-calibrated from external data sources (a modern data 
set of pollen and climate T> — (c m ,y m )), and so our focus is on the posterior distribution n(c,v\y). Our 
particular interest is in the distribution of the incremental variance terms u, = v(ti-\,ti), representing the 
square of the volatility. We set these as being Inverse Gaussian ( Barndorff-Nielseri] 1997), written IG(r],(j)), 
yielding an infinitely divisible long- tailed stochastic process on c. 



From the age-depth model and the state space equations above we can form a posterior distribution: 

n— In n 

Tt(c,v,t,r),(j>\y,d) ocir(r),(f>) w(t\d) n(ci) n(Vi\r), 0, t) \\_ir{ci\ci-x, Vi) \\_^{yi\ci) 



(3) 



i=i 



i=2 



i=l 



where tt(j], <fi) is the prior distribution on (n, <f), n(t\d) is the age-depth model, 7r(ci) is the marginal prior on 
the first climate time point (which we treat as flat), TT(vi\i], cf>,t) is the Inverse Gaussian prior on increment 
variances, 7r(ci|cj_i, vi) is the evolution equation distribution, and 7r(j/,|ci) is the likelihood from the obser- 
vation equation distribution. 

Our focus in this paper is not on the likelihood ir(yi\ci), the version we use having been discussed in detail 
by Salter-Townshend and Haslett (2012), and [Sweeney (2012). Instead, we focus on how to fit the model 
defined by Equation [3] in a situation where repeated calls to the likelihood are costly. In fact we separate 
out the calculation of the likelihood into: 

(a) An initial calibration step, where we cut feedback from to form 7r(yj|c,) = J ir(yi\ci, 6)ir(6\T>) d0. From 
these we create marginal data posteriors (MDPs) of the form 7r(cj|y,). 

(b) A second model-fitting step where we use the MDP in place of the likelihood to create the posterior 
distribution 7r(c, v, t, T), <f>\y, d) as above. 

We justify the validity of step (a) in Section [4] the remainder of the paper is concerned with step (b). A 
Directed Acyclic Graph (DAG) for our complete model is shown in Figure[l]and discussed further in Section|5] 

A key aspect of our approach is that many of the parts of Equation [3] are treated as mixtures of Gaussian 
distributions. The marginal data posteriors are approximated as finite mixtures of Gaussians, which allows 
the posterior to be factorised, and removes the need for c to be estimated during the model fitting stage. 
We can then create sample paths of c(t) as a mixture of Gaussian distributions depending on the NIG pa- 
rameters and the MDPs. Similarly, the NIG prior can be seen as a mean-scale infinite mixture of a Gaussian 
distribution where the mixing distribution is Inverse Gaussian. We make use of this property to give fast 
updates of the increment variance parameters v. 



Our approach fundamentally differs from previous work in the use and formatting of data used as 'input' to 
an inference procedure. The work of Mann et al. (19981 and Mann et al. (19991 use 'products': pre-treated 



proxy data sets that have been aggregated and transformed on to a scale that is supposed to match an 
aspect of climate such as mean annual temperature. Often the details of such transformations are lost to 
the final inference so that the link between proxy data and climate, especially with respect to uncertainty, 
is not considered during the reconstruction of past climate. Similarly, other authors (e.g. Li et al. 2010 1 



have simulated their proxy data from climate model output, so that the input format is analogous to that 
of products. By contrast, we use an inference procedure that takes raw pollen data directly to perform 
inference on climate. However, the marginal data posteriors we produce might be considered as products 
though with a more fully quantified estimate of uncertainty. 
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Figure 1: A Directed Acyclic Graph for our palaeoclimate model. Broken lines indicate relationships where 
feedback is cut. See Section [5] for more details on the modelling relationships and Technical Appendix [A] for 
details on the notation. 



The paper is organised as follows. In Section[2]we examine previously used methods for creating palaeoclimate 
reconstructions. In Section [3] we present a simple version of our time series model based on ice core data. 
In Section]?] we introduce the mathematical justification of our method, and apply it to the palaeoclimate 
scenario in Section [5] In Section [6] we show the results of our model applied to a site in Northern Ireland. 
We examine some shortcomings of our model and discuss future directions in Section [7] The mathematical 
derivation is somewhat complex so we include a notational glossary in Technical Appendix [X] We include 
some of the more technical details of the algorithm in Technical Appendix [5] We perform model validation 
checks in Appendix |C| The resulting model is implemented in the R pac kage Bclim, which is available online 
at the R repository (R Foundation For Statistical Computing Austria 2011). Instructions for using the R 
package can be found in Technical Appendix [D] 



2 Previous work in statistical palaeoclimatology 



Here we explore the ways in which other authors have dealt with various aspects of palaeoclimate recon- 
struction. Whilst recent work has focussed on model-based approaches (particularly Bayesian). authors in 
the more distant past have used other methods. The most famous of these is the Mann et al. ( |1998 1999) 
papers using principal components regression or the regularised EM algorithm, though others such as [Hunt- 
ley (1993) and Ter Braak (19951 follow differing non-model based criteria. Other projects have focussed on 



certain aspects of palaeoclimate in relation to General Circulation Models (GCMs) or at particular points in 



time, such as PMIP (Braconnot and Harrison 2003), and the PALAEOQUMP project ( Masson-Delmotte 



et al. 20061. For brevity, we discuss in detail here only the statistical model-based approaches used by 



Haslett et al.| ( |2006[ ); |Tingley and Huybers| ( |2010[ ); |Li et al.| ( |2010[ ) (hereafter H06, T10 and L10 respectively) 
and focus on the key climatological issues: choice of climate variables, likelihood choices, prior distributions 
on climate, chronological modelling; and the most relevant statistical issues: fitting models with complex 
likelihoods, the NIG process, and the treatment of mixture models. A more thorough review of different 



approaches can be found in Tingley et al. (2012). 
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2.1 Issues of climatology 

A key modelling choice in palaeoclimate reconstruction is the aspects of climate to reconstruct; climate being 
defined as the 'average' of weather. Many authors (e.g. Mann et al. 1998, L10) have reconstructed mean 
annual temperatures, often over large spatial scales such as the Northern Hemisphere. This may seem like 
a logical choice, given that these are relatively easy to measure and relevant to human wellbeing. However, 
as recently summarised by Huntley (20121), the choice should instead depend on the sensitivity of the proxy 



data to changes in climate. Many biological proxies have shown that they are not sensitive to changes in 
mean annual temperature so reconstructions of this sort will be biased and overly uncertain. Instead, more 



relevant variables were proposed by, amongst others, Huntley (1993): 



• the Mean Temperature of the Coldest Month (MTCO), a measure of the harshness of the winter, 

• the Growing Degree Days above 5°C (GDD5, also known as the annual temperature sum above 5°C), 
a measure of the length of the growing season, 

• the ratio of Actual to Potential Evapotranspiration (AET/PET), a measure of the proportion of avail- 
able moisture. 

The first two of these were used for reconstruction by H06. Even then, however, missing a single important 
climate variable (in this case concerning moisture) will increase uncertainty, bias and possibly invalidate 
the conditional independence of the likelihood. Other non-biological proxies (e.g. Borehole temperatures, 



Brynjarsdottir and Berliner 2011) have shown sensitivity to surface temperature of only the surrounding 



microclimate. The choice of climate variables is even more important in multi-proxy reconstructions, where 
the differing sensitivities and temporal scales at which proxies are formed needs to be carefully understood 
before models can be created. 



Once the climate variables have been chosen, we can posit a likelihood Tr(y\c,8). We see the likelihood here 
as a data generating mechanism providing realistic proxy data given the relevant aspects of climate, often 
also called a forward model indicating the direction of causation (from climate to proxy). Numerous meth- 
ods for creating a link between climate and pollen proxy data have been suggested, see |Ohlwein and Wahl| 
(2012) for a full review. There are two approaches which stand out as being most popular. The first is to 
perform a calibration via modern analogue data, as used by H06, T10 and L10. The idea is to find data, here 
T>, for which both precise proxy and climate are available, and upon which the likelihood can be trained. 
The version used by T10, L10 requires the calibration data to be temporally overlapping, through which a 
temporal proxy/climate relationship might be inferred. The version used by HOG has no temporal overlap, 
but rather uses the climatological location of modern data in a multivariate psuedo-spatial model where the 
proxy is the response. This model has been substantially enhanced in |Salter-Townshend and Haslett| ( |2012[ ) 
and Sweeney (20121, and the latter is the version we use for our palaeoclimate model. The second approach 



is to use a physical model, as employed by Garreta et al. (2009) with mixed success. The physical model can 



provide a far richer relationship, taking account of differences in pollination mechanism and productivity 
between trees, and their response to complex inter-relations between climate variables. However, the models 
used are often extremely slow and may lack a stochastic element, making their use in Equation [3] difficult. 



In either scenario a likelihood is created such that: 



(4) 



where h is a function which accounts for the sensitivity of the proxy to climate with parameters Oh, and 
measurement error captured by e . Together = (0h,0 e ). Both parameters and their relationship through 



h are estimated from the calibration data, T>. In Sweeney (2012) this relationship is explicit; / is a zero- 



inflated Negative Binomial distribution, with Oh the parameters of a Gaussian Markov Random Field within 
a nested Dirichlet distribution specifying the proportions of each taxa. In L10 and T10 / is Gaussian 
(the proxy data y are 'products') with h a linear regression so that Oh represents regression parameters 
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and 8 e residual variances and covariances. Both L10 and T10 use simulated data ('pseudo proxies') so it 
is not stated exactly which transformation of y might link back to a real-world biological proxy such as pollen. 



In HOG the prior distribution on climate was a t s random walk which is not closed under addition. Both 
L10 and T10 use more conventional time scries methods, employing a Gaussian MA(2) and AR(2) model 
respectively. In T10 the prior distribution is allowed to vary spatially with a stationary Exponential covari- 
ance structure. In L10 the prior is expanded to include extra covariates corresponding to known forcings 
in a linear fashion. Unfortunately these structures are not applicable to the present situation where time 
is uncertain and continuous. In palaeoclimate reconstruction from pollen, uncertainty in time arises from 



radiocarbon dates taken from the pollen core which require non-linear calibration (see Buck et al. 19961. 



These contrast with the other proxy data used by L10 and T10 such as tree rings for which the times are 
known precisely. For our purposes we use the Bchron software of Haslett and Parnell (2008), being explicitly 
built for inclusion in palaeoclimate reconstructions. 



2.2 Issues of statistical modelling 

When the likelihood and prior distributions are Gaussian, as in T10 and L10, then Equation|3]is conceptually 
easy to solve via traditional MCMC or Particle-type methods. Things are complicated slightly by the high- 
dimensionality of the climate and volatility parameters and the sheer quantity of data. H06, however, have 
a non-linear model and thus combine the MCMC methods with some approximations which cut feedback 
between parts of the model. In particular, parameters 8 of the likelihood are learnt solely from the modern 
data. Thus information from any particular core contributes no 'further' information. In this paper we use 
a likelihood which is considerably more complex than that of H06, and is thus far too slow to solve using 
these methods. 



A number of alternatives have been proposed for fitting models with complex likelihoods, including but 
not limited to: Approximate Bayesian Computation (ABC; Beaumont et al. 20021, which relies solely on 



simulations from the likelihood and user-specified sufficient statistics to compute likelihood 'distances'; and 
statistical emulation (SE; |Kennedy and O'Hagan 2001| , which builds a stochastic approximation to the 



likelihood through a designed experiment. However, since we have reasonably good access to a well-defined 
likelihood it seems inappropriate to throw away much of the information in an ABC type algorithm. Sim- 



ilarly, whilst SE has been shown to work well in a high dimension climatological situation ( jHolden et al. 
2010), it is mostly used for deterministic models (though see Henderson et al. 2010). 



As stated earlier, our model can be usefully seen in multivariate state-space terms for which there are numer- 

for a review) . These proceed 



2010 



ous algorithms based on Sequential Monte Carlo (see, e.g. jCarvalho et aLj 

(in our notation) in a 'forward' or filtering stag^jby simulating from 7r(ci) and then forming Tr(ck\yi, ■ ■ ■ , yu) 
sequentially for k — 2, . . . , n. The filtering densities Tt(ck\yi, ■ ■ ■ , yk) are related to our marginal data pos- 



teriors, being posterior distributions on a subset of the data. However, their creation requires both the use 
of the evolution and the state equation for every time point. In the subsequent sections, we show that in 
our situation it is possible to produce a valid joint posterior via only the MDPs, which does not require 
calculation of the evolution density at the forward stage. 



A related problem in the estimation of state space models is known as the smoothing or 'backwards' step 
to produce a full joint posterior 7r(ci, . . . ,c n \yi, . . . , y n ). It is well-known that this is a challenging problem 
for particle methods ( |Doucet et al.[ 20081, especially in the case of large n and where inference on other 
parameters are also required. Indeed, our focus is on this smoothing stage of state-space modelling, though 
even this step (when using the nomenclature of particle filters) over-stresses the focus on the posterior mean 
and is less natural when the focus is, as here, on past volatility. By contrast, we are in a somewhat unusual 
situation in that we are happy to cut feedback so that the likelihood depends only on the state parameters 



3 Unrelated to the forward model / presented in Equationjl] 



G 



(climate c) with the distribution on 8 known. When the prior is intrinsic we show that it is possible to 
integrate out our state variable c and bypass many of the issues caused by particle degeneracy and sample 
size. We thus believe that our MDP approach is superior (achieving faster convergence even for large n), 
though we report elsewhere on the general properties of our fitting algorithm. 



We use the NIG as a prior distribution on differences Cj 



i-i across continuous time. The distribu- 
tion was introduced by B arndorff-Nielsen] (1997) as a mean-scale mixture of Gaussian distributions. If 
'vi ~ N(n + (3vi,Vi) and Vi ~ IG(a,S) where IG is in the Inverse Gaussian distribution (Betro and 



Rotondi[ 1991) , then Xi ~ NIG(fi, j3,a,S) with mean fj,, skew /3, and variance/tail parameters a, 5. Most 
importantly, the NIG distribution is closed under addition, so that if Xi ~ NIG(fij, (5, a, 6j) then ^jXj ~ 



NIG (^2 fai Pi a i Bayesian inference for the NIG distribution has been discussed by Karlis and Lillestol 

(2004), providing closed- form complete conditionals and thus a neat Gibbs algorithm. Some of these com 



plete conditionals transfer over to our model and are discussed further in Section [3J The form we use is that 
of the NIG process as discussed by Ribeiro ( 2003 ) , representing the random walk as a subordinated Brownian 



motion. The resulting NIG process is, of course, not the only long-tailed smoothing model available, forming 



part of the larger class of Levy processes (e.g. B arndorff-Nielsen and Shephard. 2001). Similarly, Fonseca 



and Steel (2011) provide a class of long-tailed smoothers for space-time models. Where discontinuities are 



expected, a change-point model might be used (e.g Wyse et al. 2010 1. However, the NIG process is extremely 



simple to work with, and provides many of the features we might expect to appear in a dynamical system 
such as climate. 



The finite mixture models of which we make use are taken from the Mclust algorithm inspired by |Fraley| 

and Raftery ( 2002 1 . Here, a multivariate random variable x arises as a mixture of G normal distributions 
1 • ^ ^ ■ 

with unknown means and variances, so that x — ^2„ = iPgN(^, g ,T g ) where the p g are mixture proportions. 



We use this technique to approximate our MDPs as mixtures of Gaussians via the EM algorithm (Dempster 



et al. 19771 which provides posterior modes for fJ. g ,r g . A full summary of the methods by which this type 
of model may be fitted can be found in Friihwirth-Schnatter ( 2006 1 . An infinite mixture version via the 



Dirichlet Process (e.g. Congdon 2001 ) would also be appropriate were we to be interested in simultaneously 
estimating the number of mixture components G. 



3 A motivating example: the GISP 2 ice core 



We illustrate our long-tailed smoothing approach and how we might answer our motivating questions in a 
simplified scenario based on the GISP2 ic e core of Stuiver ( |2000| . A similar but more sophisticated analysis 
has been undertaken by Davidsen (20101. The proxy data here are measurements of <5 18 0, the deviation 



from a standard of the ratio of the two stable isotopes of oxygen measured in laminated (layered), and thus 
precisely dated, ice. However, the data are not observed at regular time intervals. For illustration we assume 
an identity relationship between <5 18 and summer temperatures may be appropriate. A plot of the data 
covering the previous 120ka BP is shown in Figure [2j We write the response variable here as Oi = o(tj), and 
fit a model so that: 



Oi - Oj_! ~ N(fiAi + Pvi,Vi), v, ~ IG 2 (r]Ai,(f)Ai), i = 2,...,n (5) 
where A, = U — ti—\ and IG2 is the alternative parameterisation of the Inverse Gaussian distribution given 



by Betro and Rotondi (1991). Under this parameterisation, the parameter r\ can be seen as the mean unit- 
time variance v of the resu lting NIG distribution, with parameter </> controlling the variance of v. Following 
(2004), we set prior distributions so that f3 ~ N(0, t^ 1 ), [i ~ Af(0, r^ 1 ), 77 ~ Ga(a^, b v ) 
where Ga is a Gamma distribution. An MCMC algorithm now has Gibbs' steps with 



Karlis and Lillestol 
and 
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the following complete conditionals: 
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Here GIG is the Generalised Inverse Gaussian distribution, and ui — ^ Uj, U2 — E A; — fe^, M3 = ^(A^) 2 /?;,. 
Such a model is extremely quick and easy to fit and, from the resulting estimates of v, we can use an Inverse 
Gaussian bridge to obtain decadal estimates of climate volatilities. Writing v(ti,tj) as the squared v olatility 
associ ated with time increment (ti,tj), the bridge result for time t* with fj < t* < tj is provided by Ribeiro 
(2003) and gives: 



n(v(ti,t*)\v(U,t*) + v(t* ,tj) = v) oc v(U,t*)(v — v(ti,t*))exp 



4>n f (f -uf (tj-t*) 2 

2 \ v{ti,t*) v-v(U,t* 



(6) 



where the quantity inside the exponent is a Xi distribution. Exact methods for sampling from Equation 
[6] are outlined in Ribeiro ( 2003[ ). From the resulting estimates, we can look at the changing volatility in 
climate over time. Alternatively a return level plot could be created (see e.g. Coles 20011. Figure [3] shows 
the posterior distributions of (/x,/3,7y,(/)) and the 95% credible intervals for volatilities ^/v from fitting the 
above model to the ice core data after a run of 100,000 iterations, discarding 20,000 as burn-in, and thin- 
ning by 80, with all hyper-parameters (a v , b^, a^, 6 M , r^, rp) set to 0.01, indicating only vague prior knowledge. 



4 Inference for marginal-data and joint-data posteriors 

In this section we distinguish between the marginal-data posterior n(ci\yi) and the joint-data posterior 7r(c|y), 
and subsequently introduce one key aspect of the modelling, finite Gaussian mixtures, that allows us to retain 
rich models and yet to achieve considerable algorithmic speed. We extend the ideas of the previous section, 
so that the NIG process is applied to climates c which are now multivariate latent parameters, connected 
to the proxy data y via a likelihood 7r(y|c) = J 7r(y|#, c)-k{6\D) dO. The key benefit of using mixtures of 
MDPs is that the highest dimensional parameter (climate c) can be simply marginalised out of the model 
and re-created at a second stage, focussing inference only on the climate volatility parameters. 

It is first important to note the distinction between ir(ci\y) and ir(ci\yi). The former is the marginalised 
joint data posterior 7r(c, u, 77, (f>\y) (the left side of Equation |3|; the latter, our MDP, represents only the 
information in Ci obtained from the proxy data yi at that same layer i. It is trivial to note that the MDP 
can be obtained directly from the application of Bayes' theorem to the conditionally independent likelihood: 

There are two features of our model which make the use of such MDPs feasible. First, we are happy to 
cut feedback so that the parameters 9 are integrated out of the likelihood. Second the prior on climate is 
intrinsic so, provided that a flat prior is applied to 7r(ci), all other 7r(cj) are flat too, thus the denomina- 
tor of Equation [7] is a constant. Now the creation of 7r(cj|j/j) for every layer can be achieved in parallel 
at an offline early step of the process, with the lookup of 7r(cj|yj) now being near-instantaneous. The 
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Figure 2: The GISP2 ice core data of Stuiver (2000). The y-axis contains the stable isotope measurement 



(5 18 after standardisation to have mean and standard deviation 1. A high value corresponds approximately 
to warmer summer temperatures, and the lower to colder summer temperatures. The period to 14ka BP 
is highlighted to match with Figure [3] (right panel) which shows the volatilities from this period. 




Figure 3: Posterior plots from fitting the model defined by Equation [5] to the GISP2 ice core data. The 
left plots show the posterior distributions of parameters fj,, j3, r\ and 0, whilst the right-most plot shows the 
posterior distribution of v over the time interval to 14ka BP. The empty circles indicate the posterior mean 
whilst the vertical lines indicate the 95% credibility intervals. The century with the highest (posterior mean) 
climate volatility is 12.1k to 12.2k BP. 
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'cutting feedback' from likelihood parameters is an unremarkable and implicit assumption in most studies 
involving instrumental data; e.g., experimental data on temperature are not normally used to re-calibrate 
the thermometer that generated them. It is interesting to note that in many branches of applied statistics 
what is considered 'data' might actually be an MDP, though Equation[7]would never be explicitly considered. 

More formally, we now consider replacing the likelihood in Equation [3] with the product MDP Yii^i^lVi) 
which we write as 7TMDp(c|y) to avoid confusion with the marginalised joint data posterior n(c\y). Assuming 
the above requirements are met (i.e. intrinsic prior on c, and feedback cut between likelihood parameters 9 
and y) there are several further advantages to performing inference with the MDP: 

• Once n(ci\yi) has been created for every i, we never have to make another call to a potentially expensive 
product likelihood n"=i n {Vi\ c i)- 

• c is of lower dimension than y, and thus is easier to store in large quantities. 

• Different methods for creating the marginal-data posterior can be compared without any need to re- 
run the full likelihood computation. Although in this paper we use MDPs constructed by our research 
group, the methodology we describe can in principle receive input from any Bayesian procedure. 

• We can explore marginal-data posterior distributions, and remove them at will to cross- validate. 

In fact, such an approach may have advantages in other areas of Bayesian statistics. Clearly the approach 
does not require a flat prior on ci, although that is what we use: any prior distribution in which the joint 
and marginal prior distributions are available will yield a suitable joint posterior. We now elaborate with a 
simple case in which the likelihood, and thus also the MDP, is Gaussian. 

4.1 A simple example 

For illustration we consider a simple case with fixed equal time steps and a simple Gaussian random walk 
prior Ci — Cj_i ~ N(0,v). We assume a likelihood yi\ci ~ N(ci,rf ) with known Tj so the MDP is trivially 
Ci\Vi ~ ^{Uii T ^ 1 )- The target of the estimation is ?r(c,v\y). Some simple algebra shows that the posterior 
factorises: 7r(c, v\y) cx 7TMDp(c|?/)7r(c|u) = n(c\y, v)Tr(y\v)ir(v). The distribution n(c\y,v) here is Gaussian: 
c\y, v ~ N(c; VDy, V). The distribution ir(y\v) is also Gaussian on first differences of y (though this does not 
extend to the more complicated examples we present in the next section). Here D is the diagonal precision 
matrix with known components Tj and V = (D + W) . V in turn involves the precision matrix W of the 
random walk, being v~ 1 B T B where B is an (n — 1) x n differencing matrix with the first row structured 
as (— 1, 1, 0, . . . , 0), and subsequent rows structured similarly Note that despite the prior on c being im- 
proper all matrices involved in n(c\y,v) and ir(y\v) are of full rank. The precision matrix [D + W) is itself 
tri-diagonal and so allows fast inversion. Taken altogether, we can design a fast algorithm for computing 
the posterior by first performing inference on the parameter v (by marginalising over c) and subsequently 
creating a posterior for c via Tr(c\y,v). 



4.2 Multivariate Finite Mixtures for MDPs 

The model above can be extended to the non-linear case where the MDP is Cj|t/j <~ N((j>i — /i(?/i), d^ 1 = 
d(yi)^ 1 )- Following the same approach we obtain the joint data posterior n(c,v\y) cx n(c\y, v)ir(y\v), where 
again c\y, v ~ N(c, VD/j,, V). Now, however, n(y\v) is non-Gaussian, but the same marginalisation over c is 
possible. We can further remove the assumption that the marginal data posterior is Gaussian by extend- 
ing to the case where the MDP is treated as a finite mixture. More fully, we now consider MDPs which 
can be written, for an m-dimensional Ci, as 7r(c,|yj) = 2 9= i PigN (0%; fi ig , T^ 1 ) where we now use bold 
type to clearly indicate the use of multivariate climate at each layer i. The terms pi g are probabilities and 

^/XjpjT-r 1 ^ are vector and matrix respectively; these are all functions of the count data y i associated with 
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layer i. In the previous section, there was but one component in each mixture and we identified the sole fii 
with a scalar yi. 



The formulation may alternatively be expressed in terms of an underlying indicator random variable fcj, 
which with probability pi g selects component g. Conditional on ki, the MDP is simply Gaussian, and we 
have but a small extension of the previous section. We propose this as an approximation to the true MDPs. 
In the Bayesian situations we envision, the ■K(ci\y i ) are typically only available through (Markov Chain) 
Monte Carlo samples. The mixtures are then derived by fitting a finite Gaussian mixture to each sample. 
As detailed in Section [2] we use the MClust algorithm of Fraley and Raftery ( 2002 ) . This is a once-only 
exercise; there is no inference in the following on any of these parameters, which are taken to be known. The 
implications of the approximation are that we have access to the very simple results in the previous section. 



It follows that 

7T M Dp(c|y) = JJtt^I^) = ny^p ig N (c ig - n ig ,T-^) \ =J2 p K N ( c ^K> t k) ^ 

i i I S J K 

Here the index K runs over n-tuples {kf, i = 1, . . . n} and pj^ = g . Pig- If each mixture involves G com- 
ponents, then there are G n versions of K; the vectors /x^f hold the appropriate stacked /j, ig and the block 
diagonal matrices r have blocks Ti g . We abuse notation somewhat for simplicity, allowing the subscripts 
(ig) and K to differentiate the very differently dimensioned vectors {Hi g > aid matrices {jigi T K)- 

Again K is an index vector, with associated probability pj£- All terms are known. It is to be recalled that 
the above refers simply to the product of MDPs; it is not the joint distribution ir(c\y) that we seek here. 



In the following sections we replace the likelihood corresponding to the observation equation with the prod- 
uct MDP represented as mixtures of Gaussians. By integrating out c we thus bypass the evolution equation 
and are able to focus our inference on the volatilities at a first stage, updating them via standard Metropolis 
Hastings MCMC. Subsequently, we can produce climates c via simple mixtures of multivariate normal dis- 
tributions defined by the MDPs and the posterior distribution of the volatilities. 



5 Palaeoclimate modelling using MDPs 

To complete our palaeoclimate model, we combine the MDP approach presented above, which allows us to 
marginalise over climate c, with the NIG prior used in Section [3] The NIG prior is applied to each climate 
dimension individually, so that the time series are treated as independent a priori, though dependence is 
induced through the MDPs which capture relationships between the climate variables. At this stage we also 
include our age-depth model which allows for uncertainty in the chronology measurement. We use Bchron 



(Haslett and Parnell 20081 to provide very many samples of t\,.. . ,t n , each with the monotone restriction 
ti — ti-i > 0. In fact, the Compound Poisson Gamma Process used in Bchron could be considered another 
intrinsic prior and so might be updated as part of our model. However, we do not explore that approach 
further here, instead preferring to again cut feedback between the age-depth model and the other parameters. 



In summary, the following analysis stages are required to perform inference on the joint posterior distribution 
of climate and volatility: 

1. Creation of a suitable chronology via, e.g. Bchron. 

2. (P) Creation of MDPs for each layer in the core. 

3. (P) Approximation of MDPs as Gaussian mixtures. 

4. MCMC run to estimate volatilities v. 
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5. Posterior creation of climate c 



6. Interpolation of volatilities v and climates c on a pre-chosen grid 



This section outlines steps 4. 5 and 6. The inclusion of (P) indicates that this stage can be performed in 
parallel. The full set of parametric relationships are shown in the DAG of Figure [T] 



Our final model can be written as: 

G 



9=1 



(9) 



where all notation is as before except for the subscript j denoting climate dimension from 1 to m = 3. We 
require the posterior: 



n(c,v,ri,(f>,K,t\y,d) 



n 7r ( r ?j) 7r (^) 

3=1 



n n ^^iAvjAj^i) 

3=1 i=l 



x n(K) x n(t\d) x 

n 

f^7rMDp(c(ii)|yj, h) 



n n 7r ( c j(^+i)i c 3(*i)> w i3) 

3=1 <=1 

After some algebra (similar to that in Section [I| we obtain: 

N( Cj ; V jK D jKfijK , V jK )N(0; „ jK , DJ^) 



(10) 



w(c,v,ri,<l>,K,t\y,d) cx AjJ 

3=1 



N(0;V. K D. K „. K ,V. K ) 



(11) 



with a = K(<p, v ,K,t\d) tein^i 1 ^.a,-)] [n^nr^^r 172 



lj'felJ ■ 



j = Llli=L \ *j l /j ' t j 3 — jy 11 3 = 

Djjf = diag{r ljkl ,. . .,T njk J, V jK = (D jK + W^ 1 and W 7 = Y,i r ,, ] I } n ! where B t is the ith row 
of difference matrix S. Thus the posterior again factorises and we are left with a likelihood containing the 
parameters of interest <fi, 77, v. Once inference on these has been performed, we can then obtain c\y, . . . from: 

7r(c|v, • • •) = J n(c\K, t, v, y. d)ir{K, t, v, 0, J7|y, d) d(K, t, v, 0, rj) 

where the first term in the integrand is a normal distribution (given by the left hand side of the numerator 
of Equation 11 ) and the second is the posterior from the first inference stage. 

We leave the computational details of fitting the model defined in Equation [TT] to Technical Appendix [B] It 
suffices to say that the parameters v require Metropolis-Hastings updates whilst the complete conditionals 
for r) and <fi are available as Generalised Inverse Gaussian and Gamma distributions respectively. In fact 
we find it convenient to fix the values of rj and <fi to some suitable prior values. The computation time is 
substantially reduced because V = (D + W)^ 1 is tri-diagonal and thus requires only 0(n) inversion steps. 
Similarly, for a new proposed value of v^j, say u*., we can write V* = (V^ 1 + — ^j 1 ) ^i^J)~ 1 i 

where is the ith row of B. Inversions of V* can be computed via the fast Woodbury formula (e.g. 



Press et al. 2002). Once this inference stage is completed, we can perform draws of climate c from the 
predictive distribution, and subsequently interpolate using the Inverse Gaussian bridge (as in Section |3| to 
estimate v at intermediary time points. Interpolations for c can be found from the resulting Brownian Bridge. 



We incorporate time uncertainty in our algorithm by sampling a chronology from ir(t\d) at every iteration. 
This involves another cutting feedback assumption as the parameters of the chronology model, and hence 
the set of times t, are informed only by the radiocarbon dates in the core and not by any information on 
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the sedimentation from the pollen. We feel that this is reasonable though it is an area for further research 
given that the pollen production of certain varieties of plants or tree surrounding a lake may inform the 
sedimentation process. The general inclusion of time uncertainty in our model greatly reduces our ability 
to discern precise volatilities, as can be seen in our case study below. We discuss this property further in 
Section 

In Technical Appendix [C] we perform a number of validation steps to discern the performance of our model 
in ideal and non-ideal simulated situations. As the climate change prior (the evolution equation) is our main 
focus here and not the likelihood, we use a number of simplified versions of the full likelihood to run tests 
(though see Appendix [C| for the full description). We use the following steps to test the algorithm: 

1. Fix parameters r]j and <pj for j = 1,2,3. 

2. Generate from the IG(rjj, (f>j) for i = 1, . . . , n — 1 and j — 1, 2, 3. 

3. Generate climates from the evolution equation. 

4. Generate 'psuedo pollen' from fg with 9 fixed. 

We then perform the steps outlined at the start of this section to produce a joint posterior distribution 
on c, v\y. Note that for simplicity we assume fixed, unit time and we choose n = 100 slices which seems 
reasonable given the types of pollen data we analyse. We evaluate our model by looking at 90% and 50% 
coverage of the posterior compared to the true values. We perform the above steps 1,000 times in a variety 
of situations where various model assumptions are violated. In Appendix [C] we show that, even under quite 
severe violations, the coverage is still extremely adequate. We perform some further model checking steps 
as part of the next section. 



6 Case study: Sluggan Bog, County Antrim, Northern Ireland 



We now apply our model to a site from Northern Ireland, previously published in Smith and Goddard ( 1991 1. 



Our goal is to answer our questions of interest, namely estimating the changing climate and climate volatility 
over time, to which we have direct access via interpolated values of v created in stage 5 of our algorithm. 
We infer the three climate dimensions GDD5 (length of growing season), MTCO (harshness of winter) and 
AET /PET (availability of moisture) , as detai led in Section |2| 28 p ollen taxa were used to create MDPs via 
the methods of Sweeney| ( 2012[ ), with Bchron (Haslett and Parnell 2008[ ) used to create an age-depth model 
and thus posterior distributions of the time of each pollen layer. We fix the Inverse Gaussian parameters at 
the posterior means from the Ice Core data in Section [2] so that r] = 2.66 and <fi = 15.33. 

The creation of MDPs and their approximation as mixtures is a relatively fast step taking less than 5 minutes 
on a modern PC with several cores, though the former is strongly dependent on the number of layers n. For 
our core we have n = 115 layers, though other cores may have many more. The MCMC stage was run for 
100,000 iterations with a burn-in period of 20,000 and thinning by 40. The resulting 2,000 iterations were 



checked for convergence using the R package boa ( Smith 2005 1 . Posterior creation of climates, and their 
subsequent interpolation, are of negligible computational impact. A full run of all stages 1-6 took around 15 
minutes on an Intel Core-i7 3.4GHz processor with 8 cores and 16Gb of RAM. 

Figure [4] shows GDD5, MTCO and AET/PET posterior distributions for Sluggan interpolated on to a reg- 
ular centurial grid from to 14 ka years BP. A Younger Dryas type event is clearly visibly in MTCO, and 
there appear to be similar changes in both GDD5 and AET/PET. Unsurprisingly, the last 10k years BP are 
reasonably constant, much like the ice core data in Figure [2] Figure [5] shows the posterior distributions of 
interpolated volatilities derived via the Inverse Gaussian bridge. The uncertainties all peak around 12ka BP, 
corresponding to periods of known rapid climate change. There are similar peaks at about 6ka BP, possibly 
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corresponding to the start of agricultural activity known as the Neolithic. 

It is important to note that the volatilities in Figure [5] are given as 95% credibility intervals. These show 
that we have large uncertainty about the volatility which is unsurprising given that this is an estimate of 
a latent process with temporal uncertainty. In Figure [6] we show one of the climate dimensions (MTCO) 
with time uncertainty ignored. The MDPs are now vertical lines as they no longer exhibit any 'horizontal' 
uncertainty. Unsurprisingly the volatilities are somewhat more clearly defined. In Figure [7j we show MTCO 
with a Gaussian instead of NIG random walk (equivalent to letting <j> — > oo). Note that, whilst the large 
change in climate around Ilk years BP is still shown, it is associated with much lower volatility and thus a 
slower climatic change. 



7 Discussion 



The model we have presented performs inference on palaeoclimate in a more detailed and thorough fashion 
than previously possible. The foundation of the model is a state space formulation which explicitly separates 
out the dynamical system (climate; the evolution equation) from the forward model (the link between climate 
and proxy pollen data; the state equation). This idea, proposed originally in Haslett et al. (20061, had also 
been suggested by Tingley et al. ( 2012[ ). We have implemented and considerably expanded this approach 
and developed a fast algorithm which can perform inference on both climate and climate volatility through 
the use of mixtures of marginal data posteriors. 



There are several drawbacks to the model as proposed. First, it is conceivable that the mixture indicators do 
not cover a sufficient variety of tuples K to sufficiently learn the climate volatility parameters. Such a prob- 
lem will increase with n and G the number of mixture components used. However, our model validation steps 
show that this is almost never the case and coverage properties, even when G is under-estimated, still seem 
adequate. Another disadvantage is the cutting feedback assumptions, first between the likelihood parameters 
9 and the rest of the model, and second between the chronology model and the NIG process. The former 
seems most reasonable, as new cores are unlikely to impact much on the climate process given the strength 
of modern analogue data available. The latter, however, poses an interesting challenge, as if the sedimenta- 
tion process is also posed as an intrinsic prior it is feasible for inclusion in our MDP-style inferential approach. 

The modularity invoked by following our modelling assumptions is of interest in future extensions. This 
modularity enables various steps to be run in parallel, and also allows us to change modules as required. 
The steps for the different modules are given at the start of Section [5] For example, to produce the fixed 
time graph in Figure [6] steps 2 and 3 (creation of MDPs and their approximation as mixtures) did not need 
to be re-run, being entirely independent of any chronological uncertainty. Thus only the other steps were 
required. This has larger implications for future modelling as, for example, a new forward model can be 
used in place of the one we use with no other changes required to any of the other steps. The same applies 
for the chronology model, the Mclust mixture algorithm, and the evolution equation itself (though it would 
still need to be intrinsic). 

There are several enhancements which follow immediately and to which our algorithm may well still apply. 
These include: multi-proxy inference, development of new forward models, and spatio-temporal approaches: 

• A multi-proxy analysis of palaeoclimate would require multiple forward models describing the relation- 
ship between climate and the various proxies. Our state-space MDP approach would not be hindered 
by such an extension, as we could simply create MDPs for the different proxies and include them as 
standard, so that the product MDP now becomes, e.g. k^dp ( c \Vi) x ^MDp 2 ( c ly2)- However, care 
needs to be taken in selecting the aspects of climate to which the different proxies are responsive. If 
these are substantially different, it may be that an extra layer of the state space model is required to 
match the different climate variables appropriately. 
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Figure 4: A plot of the centurial interpolated GDD5 (length of growing season), MTCO (harshness of 
winter) and AET/PET (proportion of available moisture; scaled up to (0,1000)) over the period to 14ka 
BP. The blue 'blobs' represent the marginal data posteriors whereas the red bands represent summarised 
posterior interpolations of climates c. 



15 



Sluggan: GDD5 




i i i i i I i r 

2 4 6 8 10 12 14 



Age (k cal years BP) 



Sluggan: MTCO 




2 4 6 8 10 12 14 



Age (k cal years BP) 



Sluggan: AET/PETx 1000 




Age (k cal years BP) 



Figure 5: A plot of the posterior centurial interpolated volatilities for the three climate dimensions for 
Sluggan. The shaded areas represent 95% credibility intervals for the centurial volatility. 
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Figure 6: Plots of MTCO (harshness of winter) for the Sluggan core when time is assumed fixed. The left 
panel shows the posterior distribution of climate c whilst the right panel shows the 95% credible intervals 
for the centurial interpolated volatility. The legend for the left panel is the same as that of Figure [4] 




Figure 7: Plots of MTCO (harshness of winter) for the Sluggan core when the evolution equation is assumed 
to be a Brownian motion. The left panel shows the posterior distribution of climate c whilst the right panel 
shows the 95% credible intervals for the centurial interpolated volatility. The legend for the left panel is the 
same as that of Figure [4] 
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• A key development in palaeoclimate will be the advancement of probabilistic forward models describing 
the causal chain from climate to proxy data. The forward model we use is relatively simplistic in 
its description of the mechanics of climate/pollen interaction, though it is far more sophisticated 
in its description of the uncertainty relating to the counting of pollen data and the relationships 
between pollen varieties. We encourage the development of physical forward models provided they 



retain suitable stochastic elements. A recent example of such thinking is Tolwinski-Ward et al. (2011 1 



Finally, we might extend our state space approach into the spatial domain and give a richer evolution 
equation: 

y(s i ,t l )\c(s i ,t i ) ~ f g (c(si,U)), i = l...,n 
c(si,ti)\ {c(si,ii), . . . ,c(si_i,tj_i)} = c ~ Ck(c), i = 2,...,n 

where now both pollen and climate are indexed by space s and time t and the prior distribution Q is 
applied to climate change, parameterised by n. We might assume that this would use all observations 
from previous time points t±,..., so that a particle algorithm might now become more appropri- 
ate. The prior £ might be a simple stochastic climate model, or a richer version of our random walk 
including covariates and a spatial process. It is immediately obvious that c will no longer factorise 
out of the posterior, yet if £ remains intrinsic a Laplace approximation might still allow our algorithm 
to proceed, though with caveats as to the size of the approximation error. Finally, even in situations 
where the prior is not intrinsic, it may be that other non-Gaussian mixture arrangements will yield 
simple tractable forms. 



Performing inference on palaeoclimate over multiple sites may be possible by following the recently proposed 
methodology of Lindgren et al. (2011 1. In fact, the borrowing of strength from nearby cores may overcome 
one of our main issues: that of temporal uncertainty. It is certainly feasible that the constrained correlation 
of neighbouring sites will reduce temporal variability and thus provide more precise estimates of climate and 
possibly its associated volatility. Following this approach seems most promising in producing a pan-European 
map of palaeoclimate and its uncertainty. 
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A Notation 



Below is a table outlining the notation we use throughout the paper and especially in the mathematical 
derivations contained in Appendix [5] 
Cij — Cj(U) latent palaeoclimate at time ti for climate dimension j 

c vector (length m x n) of latent climate at all times and in all dimensions. Sorted first by 

time and then by dimension 
Cj = c(U) vector (length m) of latent palaeoclimate where each element represents each climate 

dimension at time U 

Cj vector (length n) of latent palaeoclimate where each element represents climate over all time 

for climate dimension j 
d Depths 
/ Forward model 
g group number/mixture component 
i layer number 
j climate dimension number 
k index number 

fcj index number for layer i taking values 1 to G 

m Total number of climate dimensions 

n Total number of layers 

Oj <5 18 measurement at time ti 

Pig probability of mixture component g for layer i 

ti time of layer i in thousands of years before present (ka BP) 

Vij variance of random walk for layer i, climate dimension j 

y pollen data 

Hi vector of pollen data for layer i 

Zi n- vector of zeros with 1 in the ith and -1 in the (i + l)th position 

D j{ Diagonal matrix of precisions for climate dimension j under mixture components K 
G Total number of mixture components 

I n Identity matrix of size n 
K Index set of k\, &2, • • • , k n 

V ■ j£ Variance matrix for climate dimension j under mixture components K. V . is = (D + 

Wj Random walk precision matrix for climate dimension j 

P Skew parameter for Inverse Gaussian distribution 

A, Time differences ti — U-i 

r\ Parameter of Inverse Gaussian Distribution 

9 Set of parameters governing forward model 

fi Mixture component means, also drift parameter for ice core model 

fiijg Mixture mean for layer i, climate dimension j, mixture component g 

fj, ig m- vector of mixture means for layer i when mixture component g is chosen 

V" K "-vector of mixture means for all layers on climate dimension j under mixture components 

1 K 

7r probability distribution 

r Mixture component precisions 

Tijg Mixture precision for layer i, climate dimension j, mixture component g 

Tig m x m-matrix of mixture precisions for layer i when mixture component g is chosen - often 
diagonal 
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B MCMC details 

We require posterior simulations of v via the Metropolis-Hastings algorithm; c now having been integrated 
out and hyper-parameters 0, 77 are treated as fixed. The complete conditional for Vij is independent of the 
other v's so below we drop the subscripts j for clarity. We update v by proposing new value v* , and hence 
create new matrix Qk* = Qk + ((v*)^ 1 — v ~ 1 ) z i z T where Qk = Vk ■ Calculate MH ratio as: 

N(0;V K D Kt i K ,V K ) g(vi\v*) IGjvt^y) (tj^j 
~ N(0;V K *D Kf iK,VK*) q(v*\ Vi ) IG(vi\<f>,r,) y ~h 

|Qjcj3cxp{-| [(VKD K fi K ) T Q K V K D K fj, K ]} q{vi\v*) IG{v*\4>,r}) (v*)~i 



^4cx P U \(V K D K „ K fQ K V K D K „ K -(V K *D K „ K fQ K *V K *D K „ K ]\ x «^ x 

Q K *|5 I 2 L J J /G^il^r?) 





i r 1 


[\Qk\_ 


expj-- 



Importantly (dropping the K subscripts for the time being) V*^ 1 — Q* — Q + Zj((u*) _1 — v^ 1 )zf can be 
manipulated via the Woodbury formulae: 

(A + UCV)- 1 = A -1 - A~ 1 U(C~ 1 + VA~ 1 U)~ 1 VA~ 1 
\A + UCV\ = ICT 1 + VA- 1 U\\C\\A\ 



Thus we have: 



M - \9\ 

|Q*| \Q + A7i Zi ((v*)-i-v^)zJ\ 

\Q\ 

IQWl + dvD-i-v^zfVztl 

= (l+m-'-v^zjvz,)- 1 

Similarly: 

V -V* = Q 1 (Q*)- 1 = Q 1 Q 1 + Q- l z t ([K)- 1 - vT- 1 ]- 1 + zjQ^z^' zjQ- 1 

= Q-^zt ([(v*)- 1 vZ 1 ]- 1 + zjQ-'z)' 1 zfQ- 1 

= Vz % ( [(u*)- 1 - vr 1 ] - 1 + zfV Zi ) ~* zfV 

= ([K*)" 1 - v- 1 ]' 1 + zJVzf 1 VzizfV 
Thus R v . can be computed from V without the need for V* : 



p _ (■> , r/„*\-i „-il y V 3 OY71 J (Dk^k) Vkz iZ% VkDkIik { g(vi\v*) IG(v*\<j>,T]) («*) 2 

-Kui — I 1 + \ (Vi ) — V i \Z i VKZi I exp < r- > X r X — — — — r- X — 

> \ 2[[(v*)-^-v^]- +zJV K z i ) I ?(«?!«*) ^#,7?) v -i 



Note that this involves many repeated calculations of zfVfcZi and 

(D K HK) T V K z t zJV K D K n K = (zJV K D K fi K ) T zfV K D K fi K . 

In R it is possible to use the limSolve package and the function Solve .tridiag to create V as it is the 
inverse of a tri-diagonal matrix. By solving the linear systems we can thus directly obtain, for example Vr-z^ 
or VkDkHk, rather than computing Vk separately. 
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Scenario 


Detail 


Proportion 


Proportion 






inside 90% CI 


inside 50% CI 


1 


Gaussian likelihood 


90.7% 


50.8% 


2 


ZIP likelihood 


90.8% 


47.7% 


3 


ZIP likelihood (too few mixture components) 


90.1% 


44.5% 


4a 


ZIP likelihood (under-estimated IG parameters) 


91.6% 


46.5% 


4b 


ZIP likelihood (over-estimated IG parameters) 


94.0% 


51.1% 



Table 1: Performance of the different model validation scenarios 



C Model validation 

In this section we determine the properties of our model fitting algorithm using simulated data under some 
idealised and non-idealised circumstances. To improve the speed of our tests we simplify the likelihood 
somewhat, though since this is not part of our inference stage in this paper we feel this is reasonable. 
Similarly we simulate data only observed on fixed, unit time. We consider 5 different scenarios: 

1. A simple Gaussian test that the parameters are identifiable when simulated from the model. We set 
n = 100 and m = 3. For j = 1, ... ,m we first simulate rjj ~ U(0.1, 10) and <j>j ~ [7(0.1, 10). For 
i = 1, . . . , n— 1 we then create v\j ~ JG^O-fji 4>j) and, for i = 1, . . . ,n, we create Cij — Cj_i j ~ N(0, Wy)- 

Finally we create Si ~ U {0.02, 2) and simulate pseudo pollen yij ~ N(cij, J S^ 1 ). From the pseudo 
pollen data and the Gaussian likelihood we obtain Gaussian MDPs (with no simulation or mixture 
approximation required) which are passed, with the values of <fij and rjj, to our MCMC functions to 
provide posterior distributions of 3-dimensional climate and volatility. 

2. A zero-inflated Poisson likelihood with 3 pollen taxa. The IG parameters, volatilities and climates are 
simulated as above, but we create pseudo-pollen via yu ~ ZIP(pi, \J a\c\ + a2c\) , j/j2 ~ ZIP(p2, \J a,\c\ + a^c^) 
and 2/i3 ~ ZIP(p 3l \J a\c\ + a,2c\ + i3C§). Here ZIP(p,r) is a zero-inflated Poisson distribution with 

zero inflation parameter p and rate r. We set pi,P2,P3 respectively as pj ~ C/(0,0.2) and ai,a2,a3 
as Poisson rate parameters simulated as the modulus of a normal distribution: ctj ~ | TV (0, 1 ) | . The 
pseudo-pollen data are turned into MDPs via importance sampling. The ZIP model gives MDPs that 
are quite often multi-modal. The MDPs are then approximated as mixtures using G — 5 mixture 
components. These mixture components are then passed to our MCMC algorithm to estimate climates 
and volatilities. 

3. Exactly as (2) but using only 2 mixture components. In many situations this will be a poor represen- 
tation of the MDP and thus may bias estimates of climate or volatility. 

4. Split into two parts: 

(a) Exactly as (2) but with the Inverse Gaussian parameters rjj and <pj given an underestimating 
multiplicative bias value simulated from f/(0.5, 1). 

(b) Exactly as (2) but with the Inverse Gaussian parameters iy and <f>j given an overestimating 
multiplicative bias value simulated from U(l, 5). 

We run each of the above 1000 times, and check the coverage properties of the climate posterior to see 
whether they lie within the 90% and 50% credibility intervals. Table [l] shows the results. Under each 
scenario the model seems to perform extremely well. 
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D R Package Bclim 



Bclim is available as part of the open source, free, statistical software R (|R Foundation For Statistical Com- 



puting Austria 2011). R is available to download from www.r-project.org. To install the package Bclim 
simply type install, packages ("Bclim") at the R prompt. The Bclim package is made up of four main 
functions (covering steps 2 to 6 as given in Section [5j, 2 plotting functions for (climate and climate volatility), 
and a function which runs all necessary steps in sequence. 

Example data to run the function can be downloaded from |http:/ /maths ci .ucd. ie/~parnell_a/Bclim, 



html. To run the Sluggan example shown in Section [6] the files should be downloaded via the commands: 
# Download and load in the response surfaces: 

urll <- 'http://mathsci.ucd.ie/~pcarnell_a/required.data3D.RData' 
download. file (urll , 'required_data3D . RData' ) 



# and now the pollen 

url2 <- 'http://mathsci.ucd.ie/~pcLrnell_a/SlugganPollen.txt' 
download . file (url2 , ' SlugganPollen . txt ' ) 



# and finally the chronologies 

url3 <- 'http://mathsci.ucd.ie/~pcLrnell_a/SluggcLQ_2chrons.txt' 
download . file (url3 , ' Slugganchrons . txt ' ) 

The response surfaces in the first command above are the pre-calibrated forward model parameters 0. The 
subsequent functions use the locations of the pollen and chronology file rather than loading them into RAM: 

# Create variables which state the locations of the pollen and chronologies 
pollen. loc <- paste (getwdO , VSlugganPollen.txt' ,sep=' ') 

chron.loc <- paste (getwdO , VSlugganchrons.txt' ,sep=' ') 



# Load in the response surfaces 
load ( ' required . data3D . RData ' ) 

The functions now proceed as BclimLayer which produces the marginal data posteriors, BclimMixPar or 
BclimMixSer which approximate the MDPs as mixtures (either in parallel or serial respectively), BclimMCMC 
which produces posterior chains of volatilities and climates, and Bclimlnterp which uses the Inverse Gaussian 
and Brownian bridges to interpolate climate. Finally BclimCompile produces a list object which can be 
passed to plotBclim or plotBclimVol for plotting: 

stepl <- BclimLayer (pollen. loc , required. data3D=required. data3D) 

step2 <- BclimMixPar (stepl) 

step3 <- BclimMCMC (step2, chron. loc) 

step4 <- Bclimlnterp (step2,step3) 

results <- BclimCompile (stepl , step2 , step3 , step4 , core . name=" Sluggan" ) 



# Create a plot of MTCO (dim=2) 
plotBclim(results ,dim=2) 

# Create a volatility plot 
plotBclimVol (results , dim=2) 

Each of the above functions has an associated help file which provides further information and options. 
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